This is the first notebook in this project.

This notebook details how I looked at my interest in aviation data.

Flight operation data were obtained from United States Department of Transportation.

Now that the data is imported, I want to state the scope, data limitations, and objectives:

  1. I was only interested in the three main airports serving New York City: La Guardia, Newark, and John F Kennedy. However, there was a lack of complete datasets originating from Newark Liberty Airport. Instead, I shifted my area of investigation to Los Angeles Intl Airport.
  2. The dataset is limited to domestic flights originating and terminating within the United States
  3. I am only interested in departure data; that is, flights leaving from the three aforementioned airports
  4. If there are any underserved states/destinations
  5. Any trends or observations from the data
  6. Whether I am able to create a supervised learning model that can predict flight delays
  7. Concept of a “hub captive”
# Filters for flights departing
LAX_outbound_flight_data <- raw_flight_data %>%
  filter(., ORIGIN_AIRPORT_ID %in% LAX_airport)

The following table summarises some of the most frequently flown routes originating from LAX in the first six months of 2019.

flight_destinations_by_airport <- LAX_outbound_flight_data %>%
  group_by(ORIGIN, DEST) %>%
  summarise(number_of_flights = n(),
            mean_distance = mean(DISTANCE)) %>%
  arrange(desc(number_of_flights))
datatable(flight_destinations_by_airport)

There is another way to look at it. We know that New York City is served by three airports and it is not the only major city to be served by more than one airport. If we think about it that way,

flight_destinations_by_city <- LAX_outbound_flight_data %>%
  group_by(ORIGIN, DEST_CITY_NAME) %>%
  summarise(number_of_flights = n(),
            mean_distance = round(mean(DISTANCE),2)) %>%
  arrange(desc(number_of_flights))
datatable(flight_destinations_by_city)

Or by what the Department of Transporation defines as a destination city market.

A random thought: Can destination city markets be represented as Thiessen polygons and the distance to the nearest airport be mapped/calculated?

flight_destinations_by_market <- LAX_outbound_flight_data %>%
  group_by(ORIGIN, DEST_CITY_MARKET_ID) %>%
  summarise(number_of_flights = n(),
            mean_distance = round(mean(DISTANCE),2)) %>%
  arrange(desc(number_of_flights)) %>%
  inner_join(.,city_market, by = c("DEST_CITY_MARKET_ID" = "Code"))
datatable(flight_destinations_by_market)

The distribution can be represented as such.

frequency_histogram <- ggplot(flight_destinations_by_market, aes(number_of_flights)) + 
  geom_histogram(binwidth = 200)
ggplotly(frequency_histogram)

Another question: Just because Los Angeles is on the West Coast, does it mean that it mainly serves West Coast and American Southwest destinations?

frequency_distance_scatterplot <- ggplot(flight_destinations_by_market, aes(x = mean_distance, y = number_of_flights)) + 
  geom_point()
ggplotly(frequency_distance_scatterplot)

Apparently not. But what if we look at the spatial distribution of destinations connected to Los Angeles at the state level?

It appears that states adjacent to California tend to receive more flights than those farther away. However there are some exceptions such as IL, GA, FL, NY, and HI. With the exception of HI, IL, NY, GA, and FL are home to the hubs of the three main US carriers. HI’s high degree of connectivity with LAX is due to its geographical isolation from any other state bar California. Hence, the four main US carriers (Alaska, American, United, and Delta) and Hawaiian use LAX as a gateway to Hawaii due to the high numbers of feeder flights from across the United States into each of the carriers’ hub terminals at LAX.

# Create map
# Aggregate flights by state
lax_destinations_state <- LAX_outbound_flight_data %>%
  group_by(DEST_STATE_ABR, MONTH) %>%
  summarise(monthly_sum_flights = n()) %>%
  group_by(DEST_STATE_ABR) %>%
  summarise(mean_monthly_flights = round(mean(monthly_sum_flights)))

# Super important: Even though imported shp behave like dfs, merging with a df using anything but merge() from sp returns a df NOT shp
lax_destinations_state_shp <- merge(US_state_map, lax_destinations_state, by.y = "DEST_STATE_ABR", by.x = "STUSPS", all.x = T)
lax_destination_state_map_layer <- tm_shape(lax_destinations_state_shp) + 
  tm_polygons(col = "mean_monthly_flights", border.col = NA, palette = "viridis") + 
  tm_text("STUSPS")
lax_destinations_state_leaflet <- tmap_leaflet(x = lax_destination_state_map_layer)
lax_destinations_state_leaflet
#lax_destinations_state_airline <- LAX_outbound_flight_data %>%
#  group_by(DEST_STATE_ABR, OP_CARRIER_AIRLINE_ID, MONTH) %>%
#  summarise(monthly_sum_flights = n()) %>%
#  group_by(DEST_STATE_ABR, OP_CARRIER_AIRLINE_ID) %>%
#  summarise(mean_monthly_flights = round(mean(monthly_sum_flights))) %>%
#  inner_join(., airline_ID, by = c("OP_CARRIER_AIRLINE_ID" = "Code"))

We are often told that competition increases choice and benefits the consumer. But what do these benefits really look like? Lower fare prices? Better flight timings? Improved punctuality? Better service on board?

On fare prices. It is probably safe to say that almost everyone on any given flight paid a different price for their ticket, barring any promotional offers. Airlines will not divulge that information anyway. So it is impossible for us to look at the financial dimension without working relying extensively on pre-processed and aggregated statistics supplied by the Department of Transportation.

Depending on how observant you are, flight timings can either be a curious phenomenon or something that makes instinctive sense, like of course it will be scheduled like this. But instead of looking at how spaced out throughout the day the flights are, we can look at how close they are to their competitors’.

Second, we can also look at how punctual flights are. It is okay for them to arrive early, but not late.

late_flights <- LAX_outbound_flight_data %>%
  filter(ARR_DEL15 > 0) %>%
  group_by(OP_CARRIER_AIRLINE_ID) %>%
  summarise(number_delayed_flights = n())
ontime_early_flights <- LAX_outbound_flight_data %>%
  filter(ARR_DEL15 == 0) %>%
  group_by(OP_CARRIER_AIRLINE_ID) %>%
  summarise(number_ontime_early_flights = n())
LAX_airlines_punctuality_summary <- inner_join(ontime_early_flights, late_flights, by = ("OP_CARRIER_AIRLINE_ID")) %>%
  mutate(pct_late = round(number_delayed_flights/(number_delayed_flights + number_ontime_early_flights)*100,1),
         pct_ontime_early = round(number_ontime_early_flights/(number_delayed_flights + number_ontime_early_flights)*100,1)) %>%
  arrange(desc(pct_late)) %>%
  inner_join(., airline_ID, by = c("OP_CARRIER_AIRLINE_ID" = "Code"))
datatable(LAX_airlines_punctuality_summary)

Surprised by the results? Further, we need to take into account that the Department of Transporation classifies a delayed flight as one that arrived more than 15 minutes after its scheduled arrival time.

To look at these two things, we will pick out Destination City Markets that are served by more than one airline from LAX, which ought to result in competitive behavior.

#Departure delays do not matter as long as the flights can still get there on time given the amount of padding 
#People make decisions and plans around the scheduled arrival time, so if it is late on getting out but on time getting in, it is fine
#How often flights are delayed among similarly competitive routes

## First count the number of airlines operating to each airport and not city market beause each airport is run differently
## For each airline serving that airport, what is the proportion of flights that arrive there late, regardless of departure from LAX 
## Grouping by airline and the number of competitors, find out how late on average an airline is

LAX_outbound_flight_data <- LAX_outbound_flight_data %>%
  group_by(DEST_AIRPORT_ID) %>%
  mutate(number_airlines_serving_route = n_distinct(OP_CARRIER_AIRLINE_ID)) %>%
  ungroup()
delay_by_competitiveness <- LAX_outbound_flight_data %>%
  group_by(DEST_AIRPORT_ID, OP_CARRIER_AIRLINE_ID) %>%
  mutate(number_flights = round(n())) %>%
  ungroup() %>%
  filter(ARR_DELAY > 0) %>%
  group_by(DEST_AIRPORT_ID, OP_CARRIER_AIRLINE_ID, number_airlines_serving_route) %>%
  summarise(mean_delay = round(mean(ARR_DELAY), 2),
            mean_number_delayed_flights = round(mean(n())),
            mean_prop_delayed_flights = round(mean_number_delayed_flights/mean(number_flights)*100, 2)) %>%
  group_by(OP_CARRIER_AIRLINE_ID, number_airlines_serving_route) %>%
  summarise(mean_delay_time = round(mean(mean_delay), 2),
            mean_prop_delayed_flights = round(mean(mean_prop_delayed_flights), 2)) %>%
  inner_join(., airline_ID, by = c("OP_CARRIER_AIRLINE_ID" = "Code")) %>%
  arrange(OP_CARRIER_AIRLINE_ID, desc(mean_delay_time))
delay_scatter <- ggplot(delay_by_competitiveness, aes(x = number_airlines_serving_route, y = mean_delay_time)) + 
  geom_point(aes(color = Description))
ggplotly(delay_scatter)
competitiveness_delay_regression <- lm(mean_delay_time ~ number_airlines_serving_route + Description + mean_prop_delayed_flights, data = delay_by_competitiveness)
model_fitted <- augment(competitiveness_delay_regression, type.predict = "response")
summary(model_fitted)
##  mean_delay_time      number_airlines_serving_route Description       
##  Min.   : 9.6700000   Min.   :1.00000000            Length:59         
##  1st Qu.:32.0300000   1st Qu.:2.50000000            Class :character  
##  Median :37.4700000   Median :4.00000000            Mode  :character  
##  Mean   :37.9415254   Mean   :4.16949153                              
##  3rd Qu.:44.0400000   3rd Qu.:6.00000000                              
##  Max.   :66.5000000   Max.   :8.00000000                              
##  mean_prop_delayed_flights    .fitted              .se.fit          
##  Min.   :11.7600000        Min.   :23.2310940   Min.   :2.87425181  
##  1st Qu.:28.0800000        1st Qu.:33.6831387   1st Qu.:3.00832584  
##  Median :31.9000000        Median :38.9192234   Median :3.21541215  
##  Mean   :32.5932203        Mean   :37.9415254   Mean   :3.47753304  
##  3rd Qu.:35.7850000        3rd Qu.:41.9935141   3rd Qu.:3.56670293  
##  Max.   :59.7900000        Max.   :56.0085853   Max.   :5.63083635  
##      .resid                  .hat                 .sigma          
##  Min.   :-19.04471522   Min.   :0.144234010   Min.   :6.97733914  
##  1st Qu.: -3.74132299   1st Qu.:0.158004287   1st Qu.:7.52609778  
##  Median : -0.45300532   Median :0.180505878   Median :7.61421630  
##  Mean   :  0.00000000   Mean   :0.220338983   Mean   :7.56074615  
##  3rd Qu.:  5.21865305   3rd Qu.:0.222112283   3rd Qu.:7.64536277  
##  Max.   : 11.45392292   Max.   :0.553558932   Max.   :7.65179647  
##     .cooksd                .std.resid            
##  Min.   :0.00000064243   Min.   :-2.78421764843  
##  1st Qu.:0.00131591516   1st Qu.:-0.54025207704  
##  Median :0.00794340822   Median :-0.06508904839  
##  Mean   :0.03685270637   Mean   :-0.00240660489  
##  3rd Qu.:0.03326151478   3rd Qu.: 0.85297957760  
##  Max.   :0.36728210364   Max.   : 2.03437962066
model_plot <- ggplot(delay_by_competitiveness, aes(number_airlines_serving_route, mean_delay_time, color = Description)) + 
  geom_point() + 
  geom_line(data = model_fitted, aes(y = .fitted))
ggplotly(model_plot)

Creating a random forest model

# features with just one level have to be eliminated
LAX_outbound_flight_data_random_forest <- LAX_outbound_flight_data[, c(2:4, 7,9, 18, 20, 26:28, 35:39, 52)] %>%
  na.omit()
LAX_outbound_flight_data_random_forest$CRS_DEP_TIME <- as.numeric(LAX_outbound_flight_data_random_forest$CRS_DEP_TIME)
LAX_outbound_flight_data_random_forest$DEP_TIME <- as.numeric(LAX_outbound_flight_data_random_forest$DEP_TIME)
LAX_outbound_flight_data_random_forest$CRS_ARR_TIME <- as.numeric(LAX_outbound_flight_data_random_forest$CRS_ARR_TIME)
LAX_outbound_flight_data_random_forest$ARR_TIME <- as.numeric(LAX_outbound_flight_data_random_forest$ARR_TIME)
# not Learning Vector Quantization, "error: wrong model type for regression"
# find highly correlated features first
correlation_matrix <- cor(LAX_outbound_flight_data_random_forest[, c(1:12, 14:16)])
correlation_matrix
##                                           MONTH       DAY_OF_MONTH
## MONTH                          1.00000000000000  0.017176234172603
## DAY_OF_MONTH                   0.01717623417260  1.000000000000000
## DAY_OF_WEEK                    0.02307446296028  0.011901800794869
## OP_CARRIER_AIRLINE_ID          0.00470108778233  0.000918258304269
## OP_CARRIER_FL_NUM             -0.00917693065039  0.003321457007996
## DEST_AIRPORT_ID               -0.00801902024373  0.000665927150447
## DEST_CITY_MARKET_ID           -0.00576921220033  0.000427490025392
## CRS_DEP_TIME                   0.01815938698817  0.004718863537939
## DEP_TIME                       0.01001381251872  0.004645903001565
## DEP_DELAY                      0.02239216187749 -0.001414195076169
## CRS_ARR_TIME                  -0.03060368253775 -0.003768601246618
## ARR_TIME                      -0.03517428689688 -0.006057694656732
## ARR_DEL15                      0.02178434712863 -0.004139276703189
## ARR_DELAY_GROUP                0.02890457492379  0.000277926470520
## number_airlines_serving_route -0.00815610622940 -0.000982075600473
##                                      DAY_OF_WEEK OP_CARRIER_AIRLINE_ID
## MONTH                          0.023074462960282     0.004701087782330
## DAY_OF_MONTH                   0.011901800794869     0.000918258304269
## DAY_OF_WEEK                    1.000000000000000     0.026175432619677
## OP_CARRIER_AIRLINE_ID          0.026175432619677     1.000000000000000
## OP_CARRIER_FL_NUM              0.085361092100545     0.237565799560459
## DEST_AIRPORT_ID               -0.009108655745024    -0.062213296050852
## DEST_CITY_MARKET_ID            0.001265790188139     0.080595097761853
## CRS_DEP_TIME                   0.001519698720314     0.022336342620223
## DEP_TIME                      -0.001729763434232     0.007037482264338
## DEP_DELAY                      0.000354365607972    -0.041168281341649
## CRS_ARR_TIME                  -0.001494911160911    -0.000267275815775
## ARR_TIME                      -0.000836175766880     0.004289622745428
## ARR_DEL15                     -0.000392787182351    -0.041546444151003
## ARR_DELAY_GROUP               -0.001236984362770    -0.032120377616726
## number_airlines_serving_route -0.001928933565933    -0.161137859771195
##                               OP_CARRIER_FL_NUM    DEST_AIRPORT_ID
## MONTH                         -0.00917693065039 -0.008019020243725
## DAY_OF_MONTH                   0.00332145700800  0.000665927150447
## DAY_OF_WEEK                    0.08536109210055 -0.009108655745024
## OP_CARRIER_AIRLINE_ID          0.23756579956046 -0.062213296050852
## OP_CARRIER_FL_NUM              1.00000000000000  0.123246458725826
## DEST_AIRPORT_ID                0.12324645872583  1.000000000000000
## DEST_CITY_MARKET_ID            0.16756310734418  0.568100811727880
## CRS_DEP_TIME                   0.05051480972232  0.032690906257176
## DEP_TIME                       0.05904655691882  0.046816190088655
## DEP_DELAY                     -0.01701806360731 -0.012158952590087
## CRS_ARR_TIME                   0.07030011858049  0.043292403015316
## ARR_TIME                       0.06408262095990  0.047517275879694
## ARR_DEL15                     -0.02186375862760  0.021674064091241
## ARR_DELAY_GROUP               -0.01630814300752  0.023030851950269
## number_airlines_serving_route -0.21554607654630 -0.035058081051460
##                               DEST_CITY_MARKET_ID      CRS_DEP_TIME
## MONTH                          -0.005769212200332  0.01815938698817
## DAY_OF_MONTH                    0.000427490025392  0.00471886353794
## DAY_OF_WEEK                     0.001265790188139  0.00151969872031
## OP_CARRIER_AIRLINE_ID           0.080595097761853  0.02233634262022
## OP_CARRIER_FL_NUM               0.167563107344185  0.05051480972232
## DEST_AIRPORT_ID                 0.568100811727880  0.03269090625718
## DEST_CITY_MARKET_ID             1.000000000000000  0.04881483563294
## CRS_DEP_TIME                    0.048814835632942  1.00000000000000
## DEP_TIME                        0.059577704837697  0.91403852504151
## DEP_DELAY                      -0.013699931832770  0.05470327253197
## CRS_ARR_TIME                    0.032334060530758  0.15757456072803
## ARR_TIME                        0.033883294154481  0.11733747777693
## ARR_DEL15                       0.005672026340747  0.07503735806426
## ARR_DELAY_GROUP                 0.004589805535651  0.06522088658215
## number_airlines_serving_route  -0.152212683031355 -0.00720864070927
##                                         DEP_TIME          DEP_DELAY
## MONTH                          0.010013812518724  0.022392161877493
## DAY_OF_MONTH                   0.004645903001565 -0.001414195076169
## DAY_OF_WEEK                   -0.001729763434232  0.000354365607972
## OP_CARRIER_AIRLINE_ID          0.007037482264338 -0.041168281341649
## OP_CARRIER_FL_NUM              0.059046556918823 -0.017018063607306
## DEST_AIRPORT_ID                0.046816190088655 -0.012158952590087
## DEST_CITY_MARKET_ID            0.059577704837697 -0.013699931832770
## CRS_DEP_TIME                   0.914038525041509  0.054703272531965
## DEP_TIME                       1.000000000000000  0.077499882426295
## DEP_DELAY                      0.077499882426295  1.000000000000000
## CRS_ARR_TIME                   0.208089635535398  0.044807162748377
## ARR_TIME                       0.161852171495238 -0.036732442121359
## ARR_DEL15                      0.092389776488591  0.555620387185229
## ARR_DELAY_GROUP                0.096808453457937  0.816137245838646
## number_airlines_serving_route  0.000594182772704  0.013309763152609
##                                     CRS_ARR_TIME          ARR_TIME
## MONTH                         -0.030603682537752 -0.03517428689688
## DAY_OF_MONTH                  -0.003768601246618 -0.00605769465673
## DAY_OF_WEEK                   -0.001494911160911 -0.00083617576688
## OP_CARRIER_AIRLINE_ID         -0.000267275815775  0.00428962274543
## OP_CARRIER_FL_NUM              0.070300118580494  0.06408262095990
## DEST_AIRPORT_ID                0.043292403015316  0.04751727587969
## DEST_CITY_MARKET_ID            0.032334060530758  0.03388329415448
## CRS_DEP_TIME                   0.157574560728031  0.11733747777693
## DEP_TIME                       0.208089635535398  0.16185217149524
## DEP_DELAY                      0.044807162748377 -0.03673244212136
## CRS_ARR_TIME                   1.000000000000000  0.83230024070623
## ARR_TIME                       0.832300240706231  1.00000000000000
## ARR_DEL15                      0.049740662001855 -0.01422927248191
## ARR_DELAY_GROUP                0.052027914616019 -0.03321514403246
## number_airlines_serving_route  0.000712604700594  0.00875539201300
##                                        ARR_DEL15   ARR_DELAY_GROUP
## MONTH                          0.021784347128632  0.02890457492379
## DAY_OF_MONTH                  -0.004139276703189  0.00027792647052
## DAY_OF_WEEK                   -0.000392787182351 -0.00123698436277
## OP_CARRIER_AIRLINE_ID         -0.041546444151003 -0.03212037761673
## OP_CARRIER_FL_NUM             -0.021863758627597 -0.01630814300752
## DEST_AIRPORT_ID                0.021674064091241  0.02303085195027
## DEST_CITY_MARKET_ID            0.005672026340747  0.00458980553565
## CRS_DEP_TIME                   0.075037358064264  0.06522088658215
## DEP_TIME                       0.092389776488591  0.09680845345794
## DEP_DELAY                      0.555620387185229  0.81613724583865
## CRS_ARR_TIME                   0.049740662001855  0.05202791461602
## ARR_TIME                      -0.014229272481913 -0.03321514403246
## ARR_DEL15                      1.000000000000000  0.76486074354772
## ARR_DELAY_GROUP                0.764860743547720  1.00000000000000
## number_airlines_serving_route  0.043513787079759  0.04162619879705
##                               number_airlines_serving_route
## MONTH                                    -0.008156106229397
## DAY_OF_MONTH                             -0.000982075600473
## DAY_OF_WEEK                              -0.001928933565933
## OP_CARRIER_AIRLINE_ID                    -0.161137859771195
## OP_CARRIER_FL_NUM                        -0.215546076546301
## DEST_AIRPORT_ID                          -0.035058081051460
## DEST_CITY_MARKET_ID                      -0.152212683031355
## CRS_DEP_TIME                             -0.007208640709265
## DEP_TIME                                  0.000594182772704
## DEP_DELAY                                 0.013309763152609
## CRS_ARR_TIME                              0.000712604700594
## ARR_TIME                                  0.008755392013002
## ARR_DEL15                                 0.043513787079759
## ARR_DELAY_GROUP                           0.041626198797050
## number_airlines_serving_route             1.000000000000000
high_corr <- findCorrelation(correlation_matrix, cutoff = 0.75)
print(high_corr)
## [1] 14  9 11
# refined dataset after removing for highly correlated variables and other non-numeric
LAX_outbound_flight_data_random_forest <- LAX_outbound_flight_data_random_forest[, c(1:8, 10, 12, 13, 14:16)]
LAX_outbound_flight_data_random_forest_test <- sample_n(LAX_outbound_flight_data_random_forest, round(0.8*(nrow(LAX_outbound_flight_data_random_forest))))
LAX_outbound_flight_data_random_forest_validation <- anti_join(LAX_outbound_flight_data_random_forest, LAX_outbound_flight_data_random_forest_test)
## Joining, by = c("MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "OP_CARRIER_AIRLINE_ID", "OP_CARRIER_FL_NUM", "DEST_AIRPORT_ID", "DEST_CITY_MARKET_ID", "CRS_DEP_TIME", "DEP_DELAY", "ARR_TIME", "ARR_DELAY", "ARR_DEL15", "ARR_DELAY_GROUP", "number_airlines_serving_route")
control <- trainControl(method = "cv",
                        number = 10,
                        verboseIter = T)
rf_LAX <- train(ARR_DELAY ~., 
                data = LAX_outbound_flight_data_random_forest_test,
                method = "ranger",
                trControl = control)
## + Fold01: mtry= 2, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 97%. Estimated remaining time: 0 seconds.
## - Fold01: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold01: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 35%. Estimated remaining time: 57 seconds.
## Growing trees.. Progress: 70%. Estimated remaining time: 26 seconds.
## - Fold01: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold01: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 21%. Estimated remaining time: 1 minute, 59 seconds.
## Growing trees.. Progress: 41%. Estimated remaining time: 1 minute, 28 seconds.
## Growing trees.. Progress: 62%. Estimated remaining time: 57 seconds.
## Growing trees.. Progress: 83%. Estimated remaining time: 25 seconds.
## - Fold01: mtry=13, min.node.size=5, splitrule=variance 
## + Fold01: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold01: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold01: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 40%. Estimated remaining time: 46 seconds.
## Growing trees.. Progress: 80%. Estimated remaining time: 15 seconds.
## - Fold01: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold01: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 23%. Estimated remaining time: 1 minute, 42 seconds.
## Growing trees.. Progress: 47%. Estimated remaining time: 1 minute, 11 seconds.
## Growing trees.. Progress: 70%. Estimated remaining time: 39 seconds.
## Growing trees.. Progress: 94%. Estimated remaining time: 8 seconds.
## - Fold01: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold02: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold02: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold02: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 53 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 21 seconds.
## - Fold02: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold02: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 21%. Estimated remaining time: 1 minute, 55 seconds.
## Growing trees.. Progress: 43%. Estimated remaining time: 1 minute, 21 seconds.
## Growing trees.. Progress: 65%. Estimated remaining time: 50 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 19 seconds.
## - Fold02: mtry=13, min.node.size=5, splitrule=variance 
## + Fold02: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold02: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold02: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 42%. Estimated remaining time: 42 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 10 seconds.
## - Fold02: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold02: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 40 seconds.
## Growing trees.. Progress: 48%. Estimated remaining time: 1 minute, 6 seconds.
## Growing trees.. Progress: 73%. Estimated remaining time: 35 seconds.
## Growing trees.. Progress: 97%. Estimated remaining time: 3 seconds.
## - Fold02: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold03: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold03: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold03: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 52 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 21 seconds.
## - Fold03: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold03: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 21%. Estimated remaining time: 1 minute, 56 seconds.
## Growing trees.. Progress: 43%. Estimated remaining time: 1 minute, 23 seconds.
## Growing trees.. Progress: 65%. Estimated remaining time: 50 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 19 seconds.
## - Fold03: mtry=13, min.node.size=5, splitrule=variance 
## + Fold03: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold03: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold03: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 43%. Estimated remaining time: 41 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 10 seconds.
## - Fold03: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold03: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 36 seconds.
## Growing trees.. Progress: 49%. Estimated remaining time: 1 minute, 4 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 33 seconds.
## Growing trees.. Progress: 99%. Estimated remaining time: 1 seconds.
## - Fold03: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold04: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold04: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold04: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 52 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 21 seconds.
## - Fold04: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold04: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 22%. Estimated remaining time: 1 minute, 51 seconds.
## Growing trees.. Progress: 44%. Estimated remaining time: 1 minute, 19 seconds.
## Growing trees.. Progress: 66%. Estimated remaining time: 48 seconds.
## Growing trees.. Progress: 87%. Estimated remaining time: 18 seconds.
## - Fold04: mtry=13, min.node.size=5, splitrule=variance 
## + Fold04: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold04: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold04: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 43%. Estimated remaining time: 41 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 9 seconds.
## - Fold04: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold04: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 23%. Estimated remaining time: 1 minute, 46 seconds.
## Growing trees.. Progress: 46%. Estimated remaining time: 1 minute, 11 seconds.
## Growing trees.. Progress: 70%. Estimated remaining time: 39 seconds.
## Growing trees.. Progress: 94%. Estimated remaining time: 8 seconds.
## - Fold04: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold05: mtry= 2, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 100%. Estimated remaining time: 0 seconds.
## - Fold05: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold05: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 36%. Estimated remaining time: 56 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 24 seconds.
## - Fold05: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold05: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 20%. Estimated remaining time: 2 minutes, 0 seconds.
## Growing trees.. Progress: 41%. Estimated remaining time: 1 minute, 27 seconds.
## Growing trees.. Progress: 62%. Estimated remaining time: 56 seconds.
## Growing trees.. Progress: 83%. Estimated remaining time: 25 seconds.
## - Fold05: mtry=13, min.node.size=5, splitrule=variance 
## + Fold05: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold05: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold05: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 40%. Estimated remaining time: 45 seconds.
## Growing trees.. Progress: 81%. Estimated remaining time: 14 seconds.
## - Fold05: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold05: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 23%. Estimated remaining time: 1 minute, 43 seconds.
## Growing trees.. Progress: 47%. Estimated remaining time: 1 minute, 10 seconds.
## Growing trees.. Progress: 70%. Estimated remaining time: 39 seconds.
## Growing trees.. Progress: 94%. Estimated remaining time: 8 seconds.
## - Fold05: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold06: mtry= 2, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 99%. Estimated remaining time: 0 seconds.
## - Fold06: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold06: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 35%. Estimated remaining time: 57 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 25 seconds.
## - Fold06: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold06: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 20%. Estimated remaining time: 2 minutes, 2 seconds.
## Growing trees.. Progress: 41%. Estimated remaining time: 1 minute, 29 seconds.
## Growing trees.. Progress: 62%. Estimated remaining time: 57 seconds.
## Growing trees.. Progress: 83%. Estimated remaining time: 26 seconds.
## - Fold06: mtry=13, min.node.size=5, splitrule=variance 
## + Fold06: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold06: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold06: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 40%. Estimated remaining time: 47 seconds.
## Growing trees.. Progress: 80%. Estimated remaining time: 15 seconds.
## - Fold06: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold06: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 23%. Estimated remaining time: 1 minute, 44 seconds.
## Growing trees.. Progress: 47%. Estimated remaining time: 1 minute, 11 seconds.
## Growing trees.. Progress: 70%. Estimated remaining time: 39 seconds.
## Growing trees.. Progress: 94%. Estimated remaining time: 7 seconds.
## - Fold06: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold07: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold07: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold07: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 52 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 22 seconds.
## - Fold07: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold07: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 22%. Estimated remaining time: 1 minute, 51 seconds.
## Growing trees.. Progress: 44%. Estimated remaining time: 1 minute, 19 seconds.
## Growing trees.. Progress: 66%. Estimated remaining time: 48 seconds.
## Growing trees.. Progress: 88%. Estimated remaining time: 17 seconds.
## - Fold07: mtry=13, min.node.size=5, splitrule=variance 
## + Fold07: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold07: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold07: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 42%. Estimated remaining time: 42 seconds.
## Growing trees.. Progress: 84%. Estimated remaining time: 11 seconds.
## - Fold07: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold07: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 39 seconds.
## Growing trees.. Progress: 48%. Estimated remaining time: 1 minute, 6 seconds.
## Growing trees.. Progress: 73%. Estimated remaining time: 34 seconds.
## Growing trees.. Progress: 97%. Estimated remaining time: 3 seconds.
## - Fold07: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold08: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold08: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold08: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 53 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 22 seconds.
## - Fold08: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold08: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 21%. Estimated remaining time: 1 minute, 53 seconds.
## Growing trees.. Progress: 43%. Estimated remaining time: 1 minute, 22 seconds.
## Growing trees.. Progress: 64%. Estimated remaining time: 51 seconds.
## Growing trees.. Progress: 86%. Estimated remaining time: 19 seconds.
## - Fold08: mtry=13, min.node.size=5, splitrule=variance 
## + Fold08: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold08: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold08: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 42%. Estimated remaining time: 43 seconds.
## Growing trees.. Progress: 85%. Estimated remaining time: 11 seconds.
## - Fold08: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold08: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 37 seconds.
## Growing trees.. Progress: 49%. Estimated remaining time: 1 minute, 4 seconds.
## Growing trees.. Progress: 73%. Estimated remaining time: 34 seconds.
## Growing trees.. Progress: 97%. Estimated remaining time: 3 seconds.
## - Fold08: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold09: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold09: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold09: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 53 seconds.
## Growing trees.. Progress: 74%. Estimated remaining time: 21 seconds.
## - Fold09: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold09: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 22%. Estimated remaining time: 1 minute, 52 seconds.
## Growing trees.. Progress: 43%. Estimated remaining time: 1 minute, 20 seconds.
## Growing trees.. Progress: 65%. Estimated remaining time: 49 seconds.
## Growing trees.. Progress: 87%. Estimated remaining time: 18 seconds.
## - Fold09: mtry=13, min.node.size=5, splitrule=variance 
## + Fold09: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold09: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold09: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 42%. Estimated remaining time: 43 seconds.
## Growing trees.. Progress: 84%. Estimated remaining time: 11 seconds.
## - Fold09: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold09: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 40 seconds.
## Growing trees.. Progress: 48%. Estimated remaining time: 1 minute, 7 seconds.
## Growing trees.. Progress: 72%. Estimated remaining time: 35 seconds.
## Growing trees.. Progress: 97%. Estimated remaining time: 3 seconds.
## - Fold09: mtry=13, min.node.size=5, splitrule=extratrees 
## + Fold10: mtry= 2, min.node.size=5, splitrule=variance 
## - Fold10: mtry= 2, min.node.size=5, splitrule=variance 
## + Fold10: mtry= 7, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 37%. Estimated remaining time: 53 seconds.
## Growing trees.. Progress: 73%. Estimated remaining time: 22 seconds.
## - Fold10: mtry= 7, min.node.size=5, splitrule=variance 
## + Fold10: mtry=13, min.node.size=5, splitrule=variance 
## Growing trees.. Progress: 22%. Estimated remaining time: 1 minute, 52 seconds.
## Growing trees.. Progress: 43%. Estimated remaining time: 1 minute, 21 seconds.
## Growing trees.. Progress: 65%. Estimated remaining time: 50 seconds.
## Growing trees.. Progress: 87%. Estimated remaining time: 18 seconds.
## - Fold10: mtry=13, min.node.size=5, splitrule=variance 
## + Fold10: mtry= 2, min.node.size=5, splitrule=extratrees 
## - Fold10: mtry= 2, min.node.size=5, splitrule=extratrees 
## + Fold10: mtry= 7, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 42%. Estimated remaining time: 42 seconds.
## Growing trees.. Progress: 84%. Estimated remaining time: 11 seconds.
## - Fold10: mtry= 7, min.node.size=5, splitrule=extratrees 
## + Fold10: mtry=13, min.node.size=5, splitrule=extratrees 
## Growing trees.. Progress: 24%. Estimated remaining time: 1 minute, 40 seconds.
## Growing trees.. Progress: 48%. Estimated remaining time: 1 minute, 6 seconds.
## Growing trees.. Progress: 72%. Estimated remaining time: 35 seconds.
## Growing trees.. Progress: 97%. Estimated remaining time: 4 seconds.
## - Fold10: mtry=13, min.node.size=5, splitrule=extratrees 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 13, splitrule = variance, min.node.size = 5 on full training set
## Growing trees.. Progress: 19%. Estimated remaining time: 2 minutes, 15 seconds.
## Growing trees.. Progress: 38%. Estimated remaining time: 1 minute, 42 seconds.
## Growing trees.. Progress: 57%. Estimated remaining time: 1 minute, 11 seconds.
## Growing trees.. Progress: 75%. Estimated remaining time: 40 seconds.
## Growing trees.. Progress: 93%. Estimated remaining time: 10 seconds.
print(rf_LAX)
## Random Forest 
## 
## 84702 samples
##    13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 76232, 76230, 76231, 76232, 76233, 76232, ... 
## Resampling results across tuning parameters:
## 
##   mtry  splitrule   RMSE            Rsquared        MAE          
##    2    variance    10.84532328452  0.949788262185  4.21885539246
##    2    extratrees  10.75407471902  0.957197288942  4.47653667910
##    7    variance     5.94889445130  0.983385660566  3.72805645473
##    7    extratrees   5.73142503642  0.984914417201  3.83958009469
##   13    variance     5.33562040158  0.986958750974  3.70229522646
##   13    extratrees   5.38052339789  0.986736477105  3.83007129503
## 
## Tuning parameter 'min.node.size' was held constant at a value of 5
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 13, splitrule =
##  variance and min.node.size = 5.

The following predicts duration of flight delay for flights departing from LAX. The RMSE is about 4.8 minutes and the R-Squared value is 0.9839 which is really good!

rf_LAX_predict <- predict(rf_LAX, LAX_outbound_flight_data_random_forest_validation)
rf_LAX_predict_df <- as.data.frame(rf_LAX_predict) %>%
  cbind(LAX_outbound_flight_data_random_forest_validation$ARR_DELAY)
rf_accuracy_plot <- ggplot(rf_LAX_predict_df) + 
  geom_point(aes(x = rf_LAX_predict_df$`LAX_outbound_flight_data_random_forest_validation$ARR_DELAY`, y = rf_LAX_predict_df$rf_LAX_predict)) +
  geom_abline(aes(intercept = 0, slope = 1)) +
  geom_smooth(method = "lm", aes(x = rf_LAX_predict_df$`LAX_outbound_flight_data_random_forest_validation$ARR_DELAY`, y = rf_LAX_predict_df$rf_LAX_predict))
rf_accuracy_plot

postResample(pred = rf_LAX_predict, obs = rf_LAX_predict_df$`LAX_outbound_flight_data_random_forest_validation$ARR_DELAY`)
##           RMSE       Rsquared            MAE 
## 4.834413720146 0.989625666227 3.661690037464

The following converts arrival delay group into factors, enabling categorical predictions and the plotting of a confusion matrix.

LAX_outbound_flight_data_random_forest_test_2 <- LAX_outbound_flight_data_random_forest_test %>%
  mutate(ARR_DELAY_GROUP = as.factor(ARR_DELAY_GROUP))
LAX_outbound_flight_data_random_forest_validation_2 <- LAX_outbound_flight_data_random_forest_validation %>%
  mutate(ARR_DELAY_GROUP = as.factor(ARR_DELAY_GROUP))

Creating random forest model

rf_LAX_categorical <- train(ARR_DELAY_GROUP ~., 
                data = LAX_outbound_flight_data_random_forest_test_2,
                method = "ranger",
                trControl = control)
## + Fold01: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold01: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold01: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold01: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold01: mtry=13, min.node.size=1, splitrule=gini 
## - Fold01: mtry=13, min.node.size=1, splitrule=gini 
## + Fold01: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold01: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold01: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 87%. Estimated remaining time: 4 seconds.
## - Fold01: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold01: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 83%. Estimated remaining time: 6 seconds.
## - Fold01: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold02: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold02: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold02: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold02: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold02: mtry=13, min.node.size=1, splitrule=gini 
## - Fold02: mtry=13, min.node.size=1, splitrule=gini 
## + Fold02: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold02: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold02: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 86%. Estimated remaining time: 4 seconds.
## - Fold02: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold02: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 83%. Estimated remaining time: 6 seconds.
## - Fold02: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold03: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold03: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold03: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold03: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold03: mtry=13, min.node.size=1, splitrule=gini 
## - Fold03: mtry=13, min.node.size=1, splitrule=gini 
## + Fold03: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold03: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold03: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 86%. Estimated remaining time: 4 seconds.
## - Fold03: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold03: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 83%. Estimated remaining time: 6 seconds.
## - Fold03: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold04: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold04: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold04: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold04: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold04: mtry=13, min.node.size=1, splitrule=gini 
## - Fold04: mtry=13, min.node.size=1, splitrule=gini 
## + Fold04: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold04: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold04: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 89%. Estimated remaining time: 3 seconds.
## - Fold04: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold04: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 87%. Estimated remaining time: 4 seconds.
## - Fold04: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold05: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold05: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold05: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold05: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold05: mtry=13, min.node.size=1, splitrule=gini 
## - Fold05: mtry=13, min.node.size=1, splitrule=gini 
## + Fold05: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold05: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold05: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 90%. Estimated remaining time: 3 seconds.
## - Fold05: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold05: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 87%. Estimated remaining time: 4 seconds.
## - Fold05: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold06: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold06: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold06: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold06: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold06: mtry=13, min.node.size=1, splitrule=gini 
## - Fold06: mtry=13, min.node.size=1, splitrule=gini 
## + Fold06: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold06: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold06: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 90%. Estimated remaining time: 3 seconds.
## - Fold06: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold06: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 88%. Estimated remaining time: 4 seconds.
## - Fold06: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold07: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold07: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold07: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold07: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold07: mtry=13, min.node.size=1, splitrule=gini 
## - Fold07: mtry=13, min.node.size=1, splitrule=gini 
## + Fold07: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold07: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold07: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 93%. Estimated remaining time: 2 seconds.
## - Fold07: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold07: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 86%. Estimated remaining time: 5 seconds.
## - Fold07: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold08: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold08: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold08: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold08: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold08: mtry=13, min.node.size=1, splitrule=gini 
## - Fold08: mtry=13, min.node.size=1, splitrule=gini 
## + Fold08: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold08: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold08: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 90%. Estimated remaining time: 3 seconds.
## - Fold08: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold08: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 87%. Estimated remaining time: 4 seconds.
## - Fold08: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold09: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold09: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold09: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold09: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold09: mtry=13, min.node.size=1, splitrule=gini 
## - Fold09: mtry=13, min.node.size=1, splitrule=gini 
## + Fold09: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold09: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold09: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 91%. Estimated remaining time: 3 seconds.
## - Fold09: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold09: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 87%. Estimated remaining time: 4 seconds.
## - Fold09: mtry=13, min.node.size=1, splitrule=extratrees 
## + Fold10: mtry= 2, min.node.size=1, splitrule=gini 
## - Fold10: mtry= 2, min.node.size=1, splitrule=gini 
## + Fold10: mtry= 7, min.node.size=1, splitrule=gini 
## - Fold10: mtry= 7, min.node.size=1, splitrule=gini 
## + Fold10: mtry=13, min.node.size=1, splitrule=gini 
## - Fold10: mtry=13, min.node.size=1, splitrule=gini 
## + Fold10: mtry= 2, min.node.size=1, splitrule=extratrees 
## - Fold10: mtry= 2, min.node.size=1, splitrule=extratrees 
## + Fold10: mtry= 7, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 89%. Estimated remaining time: 3 seconds.
## - Fold10: mtry= 7, min.node.size=1, splitrule=extratrees 
## + Fold10: mtry=13, min.node.size=1, splitrule=extratrees 
## Growing trees.. Progress: 90%. Estimated remaining time: 3 seconds.
## - Fold10: mtry=13, min.node.size=1, splitrule=extratrees 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 13, splitrule = gini, min.node.size = 1 on full training set
print(rf_LAX_categorical)
## Random Forest 
## 
## 84702 samples
##    13 predictor
##    15 classes: '-2', '-1', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 76233, 76231, 76230, 76232, 76233, 76235, ... 
## Resampling results across tuning parameters:
## 
##   mtry  splitrule   Accuracy        Kappa         
##    2    gini        0.986470195730  0.981985649985
##    2    extratrees  0.892552988835  0.855630571068
##    7    gini        0.999940980665  0.999921434979
##    7    extratrees  0.999409693752  0.999214094673
##   13    gini        1.000000000000  1.000000000000
##   13    extratrees  1.000000000000  1.000000000000
## 
## Tuning parameter 'min.node.size' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 13, splitrule = gini
##  and min.node.size = 1.
rf_LAX_predict_categorical <- predict(rf_LAX_categorical, LAX_outbound_flight_data_random_forest_validation_2)
rf_LAX_confusion_matrix <- confusionMatrix(rf_LAX_predict_categorical, LAX_outbound_flight_data_random_forest_validation_2$ARR_DELAY_GROUP)
rf_LAX_confusion_matrix$table
##           Reference
## Prediction   -2   -1    0    1    2    3    4    5    6    7    8    9
##         -2 6260    0    0    0    0    0    0    0    0    0    0    0
##         -1    0 7498    0    0    0    0    0    0    0    0    0    0
##         0     0    0 3480    0    0    0    0    0    0    0    0    0
##         1     0    0    0 1430    0    0    0    0    0    0    0    0
##         2     0    0    0    0  739    0    0    0    0    0    0    0
##         3     0    0    0    0    0  442    0    0    0    0    0    0
##         4     0    0    0    0    0    0  313    0    0    0    0    0
##         5     0    0    0    0    0    0    0  225    0    0    0    0
##         6     0    0    0    0    0    0    0    0  179    0    0    0
##         7     0    0    0    0    0    0    0    0    0  120    0    0
##         8     0    0    0    0    0    0    0    0    0    0   95    0
##         9     0    0    0    0    0    0    0    0    0    0    0   59
##         10    0    0    0    0    0    0    0    0    0    0    0    0
##         11    0    0    0    0    0    0    0    0    0    0    0    0
##         12    0    0    0    0    0    0    0    0    0    0    0    0
##           Reference
## Prediction   10   11   12
##         -2    0    0    0
##         -1    0    0    0
##         0     0    0    0
##         1     0    0    0
##         2     0    0    0
##         3     0    0    0
##         4     0    0    0
##         5     0    0    0
##         6     0    0    0
##         7     0    0    0
##         8     0    0    0
##         9     0    0    0
##         10   70    0    0
##         11    0   49    0
##         12    0    0  217